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Bias and Bias Correction in Multi-Site Instrumental Variables Analysis 


Of Heterogeneous Mediator Effects 


Abstract 

We explore the use of instrumental variables (IV] analysis with a multi-site randomized trial to 
estimate the effect of a mediating variable on an outcome in cases where it can be assumed that the 
observed mediator is the only mechanism linking treatment assignment to outcomes, as 
assumption known in the instrumental variables literature as the exclusion restriction. We use a 
random-coefficient IV model that allows both the impact of program assignment on the mediator 
(compliance with assignment] and the impact of the mediator on the outcome (the mediator effect] 
to vary across sites and to co-vary with one another. This extension of conventional fixed- 
coefficient IV analysis illuminates a potential bias in IV analysis which Reardon and Raudenbush 
(forthcoming] refer to as "compliance-effect covariance bias." We first derive an expression for this 
bias and then use simulations to investigate the sampling variance of the conventional fixed- 
coefficient two-stage least squares (2SLS] estimator in the presence of varying (and co-varying] 
compliance and treatment effects. We next develop two alternate IV estimators that are less 
susceptible to compliance-effect covariance bias. We compare the bias, sampling variance, and root 
mean squared error of these "bias-corrected IV estimators" to those of 2SLS and OLS. We find that, 
when the first stage F-statistic exceeds 10 (a commonly-used threshold for instrument strength], 
the bias-corrected estimators typically perform better than 2SLS or OLS. In the last part of the 
paper we use both the new estimators and 2SLS to reanalyze data from two large multi-site studies. 



Bias and Bias Correction in Multi-Site Instrumental Variables Analysis 


Of Heterogeneous Mediator Effects 


I. Introduction 

The large number of randomized trials and regression discontinuity analyses that have 
been conducted during the past decade have produced internally valid estimates of the causal 
effects of many different social and educational interventions on many different types of behaviors 
and outcomes for many different types of individuals. These findings provide a growing base of 
credible evidence about the effectiveness of specific interventions, which is beginning to play an 
important role in evidence-based policy making and practice. However, because the theories 
behind many interventions are not well-developed, and because many interventions have multiple 
components, it is generally more complicated to determine the mechanisms through which an 
intervention operates. 

Understanding the mechanisms through which an intervention operates requires 
identifying a set of hypothesized mediators through which the intervention operates, estimating the 
effects of the intervention on these mediators, and then estimating the effects of the mediators on 
the outcomes of interest. Although randomized experiments provide a straightforward method of 
estimating the effect of an intervention on a mediator, they do not provide as straightforward a 
method of obtaining unbiased estimates of the effect of a mediator on an outcome. This is both 
because the mediators are not randomly assigned (which leads to selection bias] and because the 
values of the mediators are often measured with error (which leads to measurement-error induced 
attenuation bias, also known as "errors-in-variables"]. Under certain conditions, however, 
instrumental variables (IV] methods can be used to obtain unbiased estimates of mediator effects in 
randomized experiments or regression discontinuity analyses. 

The intuition of the IV method is as follows. A randomized trial or regression discontinuity 
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analysis can provide an internally valid estimate of the effects of an assigned treatment ( T ) on an 
outcome (Y] and on a mediator (M). In situations like this, the assigned treatment is an 
"instrument" of exogenous change in both the mediator and the outcome. In the simplest case, if it 
can be assumed that the full effect of the treatment on the outcome is produced by the mediator (an 
assumption known as the "exclusion restriction"], the average effect of the mediator on the 

AY AY 

outcome ( — ] equals the ratio of the effect of the treatment on the outcome ( — ] to the effect of the 
treatment on the mediator (^). Because the randomized experiment or regression discontinuity 
design provides unbiased estimates of the latter two effects, their ratio will be an (asymptotically) 

AY /AT AY 

unbiased estimate of the effect of a unit change in the mediator on the outcome ( — - — = — ). 

Consider for example, the recent multi-site impact evaluation of the federal Reading First 
(RF) Program (Gamse et. al., 2008) on reading achievement in the early elementary school grades. 
Reading First's theory of change posits that the RF program would increase teachers' use of five 
dimensions of reading instruction (phonemic awareness, phonics, vocabulary, fluency and 
comprehension; hereafter referred to as "RF instructional methods"), and that this type of 
instruction improves students' reading achievement. Because instructional methods were not 
randomized in the RF study, we can use an IV analysis to test the latter hypothesis, under the 
assumption that the only way that assignment to RF would affect student achievement was through 
its effect on the amount of time teachers spent using the desired instructional methods. The results 
of the RF impact study showed that on average, Reading First increased the amount of time that 
teachers spent on RF instruction by 11.6 minutes per day (^ = 11.6) and increased student 


reading achievement by 4.29 scale score points (— = 4.29). If all of Reading First's effect on reading 
achievement is produced by its effect on the use of RF instructional methods, these findings imply 


that the effect of such instruction is 0.37 scale-score points per additional instructional minute 
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The Reading First study was a multi-site trial, in which schools in 18 sites (17 school 
districts and one statewide program] were assigned on the basis of a continuous rating score or by 
randomization to receive the RF program or not. In a multi-site design, a more complex IV analysis 
is possible. Because the treatment is ignorably assigned in each site, site-specific instruments can 
be constructed by interacting treatment assignment with a zero/one indicator for each site. Such 
"multiple-site, multiple instrument" IV analyses can have both advantages and disadvantages. 

One potential advantage is an increase in precision that will occur if the effect of treatment 
assignment on the mediator varies substantially across sites. For example, if Reading First 
increased the use of RF instruction by 20 minutes per daily reading block in some sites and by 2 
minutes per daily reading block in other sites, an analysis that uses a separate instrument for each 
site can leverage this variation to provide more precise estimates of the average mediator effect. A 
second potential advantage of using a separate instrument for each site is that doing so may make it 
possible to study how the mediator effect varies across sites, if the sample sizes within each site are 
sufficiently large to enable precise estimates within each site. A third potential advantage of using a 
separate instrument for each site is that this makes it possible to study the separate effects of 
multiple mediators of a given intervention, as was done by Kling, Liebman and Katz [2007], Duncan, 
Morris and Rodrigues [2011], and Nomi and Raudenbush [2012], 

A potential disadvantage of using multiple site-by-treatment interactions as instruments is 
that, if the impacts of the treatment on the mediator do not vary significantly across sites, the use of 
multiple instruments may lead to substantially decreased precision and increased finite sample bias 
(Bound, Jaeger, & Baker, 1995; Hahn & Hausman, 2002; Stock & Yogo, 2005; and Angrist & Pischke, 
2009], 

In this paper we investigate the magnitude of the bias of multiple-site, multiple instrument 
instrumental variables estimators. We consider not only the role of finite sample bias, but also the 
role of a second type of bias, what Reardon and Raudenbush (forthcoming] refer to as "compliance- 
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effect covariance bias." This bias arises if the effect of the treatment on the mediator and the effect 


of the mediator on the outcome covary across sites (or persons, though in the present paper we are 
concerned with between-site variation]. 1 Reardon and Raudenbush (forthcoming] derive 
expressions for the value of compliance-effect covariance bias under two-stage least squares (2SLS] 
estimation of multiple-site, multiple instrument IV models with infinite samples, but do not 
examine compliance effect covariance bias in finite samples. In this paper we extend Reardon and 
Raudenbush's analysis by deriving an expression for compliance-effect covariance bias of 2SLS in 
finite samples. We then conduct a set of simulations that explore the sampling variance of 2SLS 
estimates in the presence of compliance-effect covariance. We find that compliance-effect 
covariance bias can be substantial, that it grows asymptotically with sample size (unlike finite 
sample bias, which declines with sample size], and that conventional 2SLS standard errors 
substantially underestimate the true sampling variance of the estimates when the effects of the 
mediator are heterogeneous. 

In the second half of the paper, we develop two "bias-corrected IV estimators” that are 
designed to reduce bias caused by compliance-effect covariance across sites. We use simulations to 
compare the statistical properties of these new estimators to those of 2SLS and OLS. These findings 
indicate that under a wide range of conditions, the new estimators perform better than 2SLS and 
OLS (in terms of bias and root mean squared error] if the instruments used have a first-stage F- 
statistic greater than 10 (a commonly recommended threshold for defining sufficiently "strong" 
instruments; see Staiger and Stock, 1997; Stock and Yogo, 2003], 


1 The econometrics literature on instrumental variables analysis of correlated random coefficient models 
(Heckman & Vytlacil, 1998] addresses an issue that differs somewhat from compliance-effect covariance bias. 
Bias in correlated random coefficients models is produced by a correlation between the level of a mediator 
and its per unit effect on an outcome of interest. This would occur for example, if sites that used more of a 
particular type of reading instruction experienced larger (smaller] effects on student reading achievement 
per unit of the instruction than did sites that used less of the instruction. Compliance-effect covariance bias is 
produced by a correlation between a treatment-induced change in the value of a mediator and its per unit 
effect on an outcome of interest. This would occur for example, if sites where treatment increased the specific 
type of reading instruction by a lot experienced larger (smaller] effects per unit of the instruction on student 
achievement than did sites where treatment increased the instruction by less. 
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The paper concludes with two examples of the application of the bias-corrected IV 
estimators. We first use them to estimate the effect of class size on student achievement, using data 
from data for the Tennessee class-size experiment, Project STAR. We then use them to reanalyze 
data from the Reading First Impact Study described above, estimating the per unit effect of RF 
instructional methods on students' reading achievement. These two empirical examples provide a 
useful contrast of potential applications. 

II. Bias in the 2SLS estimator 

Notation 

Consider a multi-site randomized trial, in which N subjects (indexed by /) are nested in a set 
of K sites (indexed by s e {1,2, ... ,K }). Within each site, a random sample ofn = N/K subjects 
(which can be individuals, classrooms, or schools] are ignorably assigned to treatment condition 
T e {0,1}. Let p e (0,1) denote the proportion of subjects in each site assigned to the treatment 
condition T = 1. Note that, for ease of exposition, we set n and p to be constant across sites. 

In each site, treatment status is assumed to affect an outcome Y through a single mediator 
M. Both the person-specific effect of T on M (the person-specific "compliance," denoted T) and the 
person-specific effect of M on Y (the person-specific "effect," denoted A) may be heterogenous 
across subjects. Our goal is to estimate the average effect of M on Y in the population, denoted by 
5 = E[ A], 

Throughout the paper, we make several assumptions. First, we make a pair of "stable unit 
treatment value assumptions," or SUTVA, described by Rubin (1986; see also Angrist, Imbens, and 
Rubin, 1996, and Reardon and Raudenbush, forthcoming, for statements of the SUTVA assumptions 
in the IV case). This is required so that the causal estimands are well-defined. We also assume that 
cov s (Y, A) = [cov(T, A)|S = s] = 0 (no within-site compliance-effect covariance). Although implicit 
in all IV models where the mediator is not binary, this assumption is not trivial in many cases — in 
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particular, it may be violated if individuals have some knowledge of the likely impact that M will 
have on them, and can choose levels of M in response to T, as in the Roy model (Roy, 1951], 
However, this assumption is met unambiguously if both T and M are binary and we focus only on 
compilers (Reardon & Raudenbush, forthcoming]. We assume no within-site compliance-effect 
covariance in order to focus on a distinct type of bias that may arise in multi-site IV analyses. To 
that end, we do not assume that the between-site compliance-effect covariance (denoted 
cov(y s , 8 S ), where the average compliance in site s is denoted y s and the average effect of M on Y in 
site s is denoted 5 S ] is zero; our focus in this paper is on the bias generated by non-zero covariance. 

Within a given site s, let the data generating model be 

Mi = A s + y s Ti + e t , e t ~N( 0,er 2 ) 

Y i = Q s + S s M i + u i , Ui~N( 0,u> 2 ) 
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where p is the correlation between e and u. Across sites, the covariance matrix of the y s 's and the 
S s 's is 

©-[©■(£ T a 

(i 

Note that the intercepts A s and 0 S here are conceived of as fixed (rather than random], and may be 
correlated with one another and/or with y s and 8 S . These intercepts are irrelevant to the bias, 
however, so it is not necessary to specify their structure. 

Estimation 

We wish to estimate 8 = FfA], One approach would be to estimate <5 S = E[A\S = s] in each 
site separately, using standard instrumental variables methods, and then to average the 8 s 's across 
sites. There are several drawbacks to this approach, however. First, if the instrument is weak in 
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some sites, the estimated 8 S in those sites may be substantially biased due to finite sample bias, 
leading to bias in the estimated average effect. Second, a precision-weighted average of the 8 S will 
weight sites with greater compliance (larger values of y s ] more, leading to biased estimates of 8 if 
Tyg =£ 0 (see Raudenbush, Reardon, & Nomi, 2012], 

A second approach would be to pool the data across sites and fit a just-identified site-fixed 
effects IV model, using only a single instrument (Raudenbush, Reardon, & Nomi, 2012], If y s is 
heterogeneous such a model will be inefficient because it will not make use of all the exogenous 
variation in the mediator M that is induced by the instrument. 

A third approach is to pool the data and fit an over-identified IV model, using K site-by- 
treatment status interactions as instruments. As we noted above, such a model may be preferable 
to either of the two approaches above in some cases. Because these instruments may collectively 
account for much more variation than a single instrument, the overidentified model may be more 
efficient than the single instrument model. In addition, by pooling the data, bias due to weak 
instruments in individual sites may be avoided. 2 Moreover, unlike the two approaches above, 
which can only be used if there is a single mediator, the multiple site-by-treatment interaction IV 
model can be used to identify the effects of multiple mediators. Although we do not consider the 
multiple mediator case in this paper, our approach here may be adapted to that case. 

We implement this approach as follows: First, we construct K instruments as site-by- 
treatment status interactions. Denote these as Zf = DfT it where Df = 1 if subject i is in site s and 
Df = 0 otherwise. Now the first-stage model is 

K 

M i = A s + ^ Y s Zi + eo ei~N{ 0,er 2 ). 

S = 1 

2 On the other hand, if the extra instruments explain little additional variance in the mediator, using K site-by- 
treatment assignment instruments may produce multiple weak instruments, leading to inefficient and biased 
estimates (Chamberlain and Imbens, 2004: Staiger and Stock, 1997}. A fourth possible approach is to use / 
instruments, where 1 < / < K, by interacting treatment status with indicators for J subsets of the K sites, 
where the subsets are defined in such a way that there is little within-subset variation in compliance. We 
take up this possibility later in the paper. 
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(2a) 


The second stage equation is 

Y t = 0 S + SMi + u i, u t ~N( 0, <n 2 ). 


(2b) 


Bias in OLS and 2SLS estimation 

Now let F denote the population F-statistic (the expected value of the F-statistic 
corresponding to the null hypothesis thaty s = 0 Vs in the first-stage equation). We show in 
Appendix A1 that this will be equal to 


F = 


np(l — p) 


o t 


0 2 + Ty) + 1. 


( 3 ) 


Estimating 8 via OLS will lead to bias if M t is correlated with U; in Equation (2b). In Appendix A2, 
we show that the OLS bias (the bias in the estimate of 8 obtained from fitting Equation (2b) via 


OLS) will be 


- s = p - (— ?—) + (_t— L-) 

p a Vf + n — V y 2 + z v \F + n — 1/ 


F - 1 




(4a) 


Estimating 8 via two-stage least squares (2SLS) will also result in bias. In particular, as we show in 
Appendix A3, the 2SLS bias (the bias in the estimate of 8 obtained from fitting Equations (2a) and 
(2b) via 2SLS) is approximately 




y 2 + ■ 


(5a) 


Note that both the OLS bias and the 2SLS bias have two components — one component that 
depends on the covariance of the errors (p), and one component that depends on the covariance 
between the gammas and deltas (r y5 ). The first component can be thought of as bias that arises 
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from treatment selection on levels (individuals' received value ofM is correlated with their potential 
value of Y that we would observe if they were assigned M = 0]; it gives rise to selection bias in OLS 
and finite sample bias in IV estimators. The second component can be thought of as bias that arises 
from compliance selection on site-average effects (site-average compliance with the instrument is 
correlated with the site-average effect the mediator has on the outcome Y), as might be predicted 
by the Roy model (Roy, 1951; Borjas, 1987]; it gives rise to what we refer to as compliance-effect 
covariance bias (Reardon & Raudenbush, forthcoming]. Equations (4a] and (5a] make clear that 
both OLS and 2SLS are biased in finite samples if either p =£ 0 or r Y § ^ 0. Moreover, both the OLS 
and 2SLS biases can be written as weighted averages of the two components: 

E [5 OL5 l -8 = p-( 1 - A 0LS ) + ( A 0LS ) 

L J a y z + Ty 


(4b] 


and 


O) 


E\8 2SLS ]-8 ~ p-(l 
L J a 


A 2SLS ) + ^f^(A 2SLS ), 

y 2 + Ty 


(5b] 

where A 0LS = + * and A 2SLS = In the case of OLS, the weighting depends on the relative 

magnitudes of F and n. If n » F, A 0LS approaches 0, in which case the bias due to the correlation of 
the errors is most significant. In the case of 2SLS, however, the weight depends only on the 
magnitude of F. When F is large, bias due to the correlation of the errors (finite sample bias] is 
minimized and bias due to the correlation of y s and 8 S plays a dominant role. Because A 2SLS > A 0LS 
for n> 1, the bias due to the second component will always get more weight in the 2SLS estimator 
than in the OLS estimator. However, the total bias will depend not just on these weights but on the 
relative magnitude of the two bias components. Thus, it is not a priori clear whether 2SLS yields 
less bias than OLS. 
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Factors contributing to bias in the 2SLS estimator 

The first component of bias in Equation (5a] is pure finite sample bias. This bias term is 
proportional to the within-site correlation of the error terms in the first and second stage equations 
and inversely proportional to F. As F gets large, finite sample bias becomes trivial. 

The second component of the bias in (5a] is compliance-effect covariance bias. If y = 0, this 
bias term is 0. 3 If, however, y =£ 0, we can write the compliance-effect covariance bias term as 


_ 2 YTyS_ 

Y 2 + ? Y 


(LA) = 2Corr(yMjr s ypc2j . (L_T), 


( 6 ] 


where CV y = yfz^/y is the coefficient of variation of y s . 

The compliance-effect covariance bias component depends on four factors. First, the bias 
term is proportional to the correlation between y s and S s . Second, the bias term is proportional to 
the standard deviation of the S s 's across sites. Third, the bias depends on the amount of between- 
site variation in compliance relative to the magnitude of the average compliance across sites. 
Holding constant Corr(y s , S s ), z 5 , and F, the magnitude of the compliance-effect covariance is 
maximized when | CV y | = 1 (see appendix A4] . As CV y approaches 0 (in which case the compliance 
is homogeneous across sites] or ±oo (i.e., as the average compliance across sites goes to 0], the 
compliance-effect covariance bias term goes to 0. And fourth, the compliance-effect covariance bias 
is smaller when F is small. When the instruments are collectively strong, the bias due to between- 
site compliance-effect covariance is maximized. 4 Thus, compliance-effect covariance can lead to 
bias in the 2SLS estimator even with an arbitrarily strong set of instruments. 


3 To see this, note that F = np ^ 2 p) (y 2 + r y ) + 1, so we can write the compliance-effect bias term as 
2 np ^ 2 p) yCov(y s , 8 S ) Qj, so y = 0 implies the bias is zero. 

4 Note that if y & 0, F = np ^ 2 p) y 2 (l + Cl^, 2 ) + 1, i.e., F depends on n, p, a 2 ,y, and CV y . Therefore, changing F 
by changing CV y will affect compliance-effect covariance bias in two ways while changes in F due to changes 
in n, p, a 2 , or y, holding CV y constant, will only affect compliance-effect covariance bias through their effect on 
F. 
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Each of the four factors influencing the compliance-effect covariance bias component is, in 
principle, estimable from the observed data (although estimation of Tg and Corr(y s , S s ) will be 
complicated by finite sample bias in the estimation of the d s 's], The correlation between the first- 
and second-stage error terms is not estimable from the observed data however. When F is large, 
however, the contribution of finite sample bias to the overall bias is negligible. This suggests that 
we may be able to devise a better estimator of S — one that is less biased by compliance-effect 
covariance — than 2SLS, at least for the case where F is relatively large. In Part IV of this paper, we 
develop two such estimators. 

Equation [5a] provides an approximation to the bias induced by the combination of finite 
within-site samples and compliance-effect covariance. However, Equation [5a] does not describe 
the sampling variance of the 2SLS estimator in the presence of compliance and effect heterogeneity, 
compliance-effect covariance, and finite within-site samples. It is well-known that 2SLS yields 
standard errors that are too small when there are many weak instruments, but these results have 
been developed under the assumption that S s is constant across sites (Chamberlain and Imbens, 
2004; Angrist and Pischke, 2009], In the following section, we conduct a set of simulation analyses 
to describe the sampling variance of the OLS and 2SLS estimators in the presence of heterogeneous 
compliance and effect. 

III. Simulation Analyses 

This section presents results from a series of simulations conducted with three goals: (i] to 
test whether the 2SLS bias formula presented in Equation (5a] is accurate (since it is based on an 
approximation] and to examine the extent of 2SLS bias that exists under a range of conditions, (ii] 
to assess the sampling variation of the 2SLS estimator in the presence of compliance and effect 
heterogeneity and compliance-effect covariance; and (iii] to compare the magnitude of the bias and 
the root mean squared error (RMSE] of the 2SLS estimator relative to the OLS estimator. To 
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simplify matters, the within-site variance of the individual compliance and effect parameters are set 
to zero; therefore, these simulations focus on variation and covariance of y s and S s across-site, not 
within-site. Appendix B provides a more detailed description of the simulation set-up. 

Results of the simulations are shown in Table 1. In each panel of Table 1, one of the four key 
parameters that influence the bias and sampling variability of the 2SLS estimator — CV y , the 
expected first-stage F- statistic, the compliance-effect correlation, Corr(y s , S s ), and the variance of 
the effect, zg — is systematically manipulated while the other three are held constant (see Appendix 
B for details]. Note that, except for Panel B, we set the F-statistic to 10. Columns 5-14 report the 
results obtained from 2000 simulation samples drawn from a population of sites generated 
according to the parameter values shown in columns 1-4. In each case, the simulated data are 
generated based on a true effect of S = 1, so the bias reported in columns 5 and 6 of Table 1 can be 
interpreted as the ratio of the bias to the magnitude of the true effect. 

Magnitude of the estimated 2SLS bias in the presence of compliance-effect covariance in finite samples 

In Table 1, column 5 reports the predicted 2SLS bias as computed from Equation [5a], 
Column 6 reports the estimated 2SLS bias from the simulations (the difference between the average 
2SLS estimate over the 2000 simulations and the true effect]. In each case, the estimated bias in 
column 6 is very close to that predicted by Equation (5a], As expected, the bias is larger when C V y 
is near 1; when F is small; when the correlation of y s and S s is large; and when the variance of S s is 
large. One key lesson from Table 1 is that 2SLS bias can be substantial, even when F > 10, 
particularly when the absolute value of the compliance-effect correlation is large or the variance of 
S is large (see rows 11, 15, and 19]. 

Sampling variability of the 2SLS estimator in the presence of compliance-effect covariance in finite 
samples 

Table 1 reports both the true sampling variation (column 7] (the standard deviation of the 


14 



2SLS estimates of 8 across the 2,000 simulation samples] and the average standard error reported 
by conventional 2SLS estimation algorithms (column 8], These conventional 2SLS-estimated 
standard errors are based on the assumption that 8 S is constant across sites. Equation [B14] in 
Reardon and Raudenbush (forthcoming], however, implies that the sampling variance of the 2SLS 
estimator depends on the variance of <5 S ; assuming that t 5 = 0 will lead one to underestimate the 
sampling variance of the 2SLS estimator. This is evident in comparing columns 7 and 8 in Table 1. 
The true sampling variance of the 2SLS estimates is generally much larger than that implied by the 
conventional 2SLS-estimated standard errors. Only in row 16, where is set to zero, does the 
2SLS standard error appropriately match the true sampling variance of the estimator. Note that 
this result is not merely due to the fact that, in over-identified 2SLS models, the estimated standard 
errors are often too small, especially when the instruments are collectively weak (Chamberlain and 
Imbens, 2004; Angrist and Pischke, 2009], Even in row 10, where the F-statistic is 101, the 2SLS- 
estimated standard error is one-tenth of the true sampling standard deviation. We conclude that 
conventional 2SLS-estimated standard errors may substantially underestimate the sampling 
variance of the estimates when the mediator effect is heterogeneous. 

Comparing 2SLS and OLS estimators in the presence of compliance-effect covariance in finite samples 
It is useful to compare the performance of the 2SLS estimator to the OLS estimator in the 
presence of compliance-effect covariance. Table 1 includes four columns that make this comparison 
possible: columns 10 and 11 report the predicted OLS bias (based on Equation 4a] and the 
estimated OLS bias, respectively; column 12 reports the true OLS sampling variation (the standard 
deviation of the OLS estimates across the 2,000 simulation samples in each case]; column 13 
reports the average reported OLS-estimated standard error across the 2000 samples; and column 
14 shows the root mean squared error (RMSE] for OLS (the square root of the sum of the squares of 
columns 11 and 12], These results lead to three observations: First, for the range of the parameters 
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tested, OLS bias tends to be larger than 2SLS bias. Second, the average OLS-estimated standard 
error substantially underestimates the true variability of the OLS estimator (unless = 0], which 
tends to be smaller than the true variability of the 2SLS estimator. Finally, the RMSE for the OLS 
estimator tends to be larger than the RMSE of the 2SLS estimator because the OLS bias is generally 
larger than the 2SLS bias even though OLS estimates are more precise than 2SLS. Note that this 
does not apply to cases where the 2SLS bias is larger than the OLS bias due to a large compliance- 
effect covariance (e.g., rows 11, 15, and 19]. 

We draw three primary conclusions from the described simulation analysis. First, 

Equations (4a] and (5a] provide good approximations of the 2SLS and OLS biases in finite samples 
and in the presence of site-level compliance-effect covariance. Second, even when the instruments 
are collectively strong, conventional 2SLS-estimated standard errors substantially underestimate 
sampling variance when the mediator effect is heterogeneous across sites. Third, unless 
compliance-effect covariance bias is large, the 2SLS estimator generally has less bias but larger 
sampling variance than the OLS estimator; consequently, the RMSE for the OLS estimator tends to 
be larger than that of the 2SLS estimator. Although the presence of compliance-effect covariance 
leads to some bias, it may generally not be so large as to render 2SLS less desirable than OLS. 

IV. A Bias-Corrected Multi-Site Single Mediator IV Estimator 

In Section II we demonstrated that 2SLS yields biased estimates of the average effect of M 
when there is between-site compliance-effect covariance, even if F is arbitrarily large. As we 
suggested there, however, because the magnitude of compliance-effect covariance bias may be 
estimable from the observed data under certain conditions, it may be possible to develop a method 
of correcting the 2SLS estimates to eliminate this bias. 

To build some intuition regarding our approach, consider the hypothetical data described in 
Figure 1 below. Each of the panels on the left side of the figure shows a the relationship between S s 
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and y s . In each case, 8 (the average value of 8 S across sites] equals 1. Likewise, in each case, the 
average compliance across sites equals 1, and both y s and 8 S have a variance of 1. This implies that 
CVy = 1, so these figures correspond to cases in which compliance-effect covariance bias is 
maximized (for a given value of F, Corr(y s , 8 S ), and zg). The three figures on the left side differ only 
in the correlation between 8 S and y S) ranging from Corr(8 s , y s ) = —0.50 to Corr(8 s , y s ) = +0.50. 

Under the assumptions that treatment affects the outcome only through the mediator 
(exclusion restriction] and there is no within-site compliance-effect covariance, the average intent- 
to-treat effect on the outcome within a site s will be /? s = y s <5 s . The figures on the right side plot 
these computed ITT effects against the y s 's. In practice, we can estimate the /? s 's and the y s 's, so we 
can readily produce figures of the type shown here. Note that a non-zero correlation between y s 
and 8 S will produce a figure on the right that shows a non-linear association between /? s and y s . 

This is evident in the quadratic fitted curves in the righthand figures. Thus, non-linearity in the 
observed relationship between /? s and y s is informative regarding the extent of compliance-effect 
covariance across sites, and so may be useful in developing a bias-corrected estimator. 

2SLS is equivalent to a linear regression of /? s on y s (albeit with no intercept, as the 
exclusion restriction requires that /? s = 0 when y s = 0], weighting each site by its sample size and 
the variance of the instrument within each site (Reardon & Raudenbush, forthcoming; Raudenbush, 
Reardon, & Nomi, 2012]. 5 The slope of this line is the 2SLS IV estimate of 8. Recall that the average 
value of 8 S is 1, so an unbiased estimate would yield a slope of 1, as shown by the solid line in the 
figures. The results of the 2SLS regression are shown by the dashed line. Note that when 
Corr(8 s , y s ) > 0, the slope of the fitted line is substantially greater than 1; when Corr(8 s , y s ) < 0, 
the slope of the line is substantially less than 1. The reason for this is that the sites where y s is 
largest in magnitude (farthest from 0] have more leverage in the regression; the correlation 

5 Angrist (1990] does this graphically, in a way that is equivalent to weighting each site by the variance of the 
treatment; in the stylized example here, we assume all sites have equal instrument variance and equal sample 
size. 
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between y s and 8 S means that these sites also have larger (or smaller] than average 8 s ’s, leading to 
biased estimates. 


Two Bias-Corrected Estimators 

We now develop two bias-corrected IV estimators. First, assume that the association 
between y s and 8 S is linear: 

= a 0 + a t y s + v s , v s ~JV[0, a ■$]. 


(7] 

Note that this assumption is weaker than the assumption that Cov(y s , 8 S ) = 0. We can, in principle, 
relax the linearity assumption further, and allow the relationship between y s and 8 S to be described 
by some higher-order polynomial. Equation (8] would then include a set of terms involving the 
expected values of the higher-order powers of y s . This would result in a higher-order regression 
model in Equation (12] below. 

Taking the expectation of both sides of Equation (7] yields 

E[8 S \ = a 0 + 


8 = a 0 + ayy. 


( 8 ] 

Equation (8] suggests a bias-corrected estimator for 8. Specifically, if we can estimate a 0 , a 1( and y, 
we can estimate 8 as 


8 bc = a 0 + ayy. 


(9] 


We can construct a second bias-corrected estimator by directly estimating the 2SLS compliance- 
effect covariance bias and subtracting it from the 2SLS estimate. Note that Equation (7] implies 
that a 1 = Tyg/Xy) the compliance effect covariance bias (from Equation 5a] is therefore 
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( 10 ) 


2ya 1 T Y /F - 1\ 
y 2 +Ty\ F )' 


Thus, if we could estimate F, a lt y, and z y , we can construct a plug-in bias-corrected estimator: 


i _ $2 SLS 


8 pi = 8 


2ya 1 t y (F - 1 
y 2 +f y 


( 11 ) 

To construct 8 bc and 8 pi , we must estimate F, a 0 , a lt y, and z y First, we can estimate y, r y , and F 
from the following random-coefficients model: 6 


- A s + y s Ti + e t 



T yA\ 

) ■ 


Now, we note that 


( 12 ) 


p s =E[B\S = s] 

= F[TA|S = s] 

= F[T|S = s] ■ F[A|S = s]+ Cou(rA)|5 = s 
= Ys ' + Cou(TA)|S = s. 

(13) 

Given the assumption of no within-site compliance-effect covariance, substituting Equation (7) into 
(13) yields 

Ps — Ys Ss 

= Ys(- a o + a iYs + v s) 


6 We can compute F from the estimates of y, r y , and a 2 using Equation (3). In practice, if x y is small relative 
to the sampling variance of the y s ’s, fitting a random coefficient model like (12) may not be possible, because 
the maximum-likelihood algorithm may not converge. In such cases, however, there is little or no need to use 
a random coefficient model; a fixed effects IV model (a model with a single instrument) would be preferable. 
We could also fit (12) using site-fixed effects and site-by-treatment assignment interactions via OLS, and then 
shrink the resulting f s ’s, as described following Equation (15). 
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= a oYs + <*iY s 2 + 7s W , v s ~N[0, er 2 ]. 


(14) 

In other words, under the assumption that S s is linearly related to y s , /? s can be written as a 
quadratic function of y s , passing through the origin, with a heteroskedastic error term. The 
parameters a 0 and a 1 can be estimated by fitting this model to the /? s 's and y/s. 

Although the assumption that T is ignorably assigned within sites ensures that we can 
obtain unbiased estimates of the /? s 's and y s 's, two factors will complicate the estimation of a 0 and 
a 1 from the observed data. First, we do not observe /? s and y s ; rather, we estimate them and so 
observe /? s = Ps + b s and y s = y s + g s . Regressing /? s on y s and y 2 will yield biased estimates of a 0 
and a 1 because of the error in y. Second, in finite samples, the correlation between e and u (the 
errors in the first and second-stage equations) will induce a correlation between b s and g s , as will 
5^0 (see Equation A3. 5 in Appendix A3); this will induce bias in the estimates of a 0 and a 1 . 

We can correct the first problem by regressing the fi s ’s on shrunken estimates of y s and y 2 . 
In Appendix A5 we show that 

E[P s \Ys\ = a oYs + a-L Ys* + Cov(b s ,g s )— — — , 

T y 

(15) 

where A = r y /(r y + x g ) is the reliability of the y^'s; y* = = A ? s + (1 — A)y; and 

yf = E [y 2 s\y s ] = yf + - A )- whenF is large and CV y is not small, the expected value of the 

final term in Equation (15) will be small. 7 This suggests we can regress the fj ' s on y s and y s (with 

no intercept) to estimate a 0 and a 1 . Given the estimates y, r y , F , a 0 , and a 1( we can then compute 

S bc and S pi from Equations (9) and (11). If we have large samples within sites, we can estimate the 

7 In Appendix A5 we discuss the case where F is small and/or CV y is small; in such cases, the final term in (15] 
may have a large, non-zero expected value, implying that Equation (15] should have a non-zero intercept. In 
such cases, however, our simulations show that including an intercept in model (15] leads to a very large 
sampling variance of the estimates of the intercept and ay, the loss in precision is far worse than any 
reduction in bias achieved. 
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P s ’s and y s 's very reliably, which will lead to precise estimates of y, r y , a 0 , and a lt and thus, to 
precise estimates of 8. Note that 8 pi and 8 bc rely on the same basic information (both use y, i y , and 
a x ) § bc also uses d 0 , however], but in different ways, suggesting that they may perform somewhat 
differently under different conditions. 

Standard errors for 8 bc and 8 pi 

We compute standard errors for 8 bc and 8 pi via bootstrapping. Specifically, we [i] draw a 
sample of K sites, with replacement, from the original sample of sites; [ii] draw a sample of p ■ n 
treatment and (1 — p)n control cases, with replacement, separately in each resampled site; [iii] 
estimate 8 bc and § pi from this new sample as described above in Equations (9] and (11]; (iv] 
repeat steps (i]-(iii] many times (we use 500 draws in the simulations described below]; and (v] 
use the variances of the estimates from these repeated draws as estimates of the sampling 
variances of the S’ s. 

V. Simulation Analyses 

We assess the performance of the two bias-corrected IV estimators described in Section IV 
using a set of simulations, comparing the results based on the new estimators with those from 2SLS. 
Appendix B describes the simulation set up in detail. We vary three parameters — the coefficient of 
variation for compliance [CV y ), the expected F statistic, and the compliance-effect correlation — 
across simulations. 

Table 2 presents the estimated bias, sampling variation, estimated standard error, and root 
mean square error of the two bias-corrected estimators for a range of simulated populations. 
Columns 1-3 report the parameters used in each simulation; columns 4 through 11 report the 
estimation results for the two bias-corrected IV estimators. For comparison, Columns 12-15 report 
the corresponding 2SLS bias, standard error and RMSE. 
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Bias of the bias-corrected IV estimators in the presence of compliance-effect covariance in finite 
samples 

Columns 4 and 8 in Table 2 present the estimated bias for the two bias-corrected estimators 
across 2000 simulation iterations. 8 Panel A indicates that the magnitude of the estimated bias of 
both bias-corrected estimators reaches its minimum value when Cl^equals 1, other things being 
equal. As CV Y deviates from 1, the absolute value of estimated bias increases. 9 Thus, the bias 
corrected estimators are most effective at eliminating bias when CI^ is near 1. This is in stark 
contrast with the pattern observed in panel A of Table 1 and in columns 12-15 of Table 2, which 
show that bias in 2SLS exhibits an inverse "U" shape that reaches its maximum value when CV y is 1 
and diminishes steadily as CV y starts deviates from 1 in either direction. Panel A also indicates that 
the bias-corrected estimator exhibits less bias than the plug-in estimator when CV y < 1, and more 
bias than the plug-in estimator when CV y > 1. 

Panel B suggests that both bias-corrected estimators do a good job eliminating bias when 
the first stage F-statistic is large. A comparison between Columns 4, 8, and 12 indicates that, when 
F is extremely small, the absolute value of bias of the bias-corrected estimators is similar to that of 
2SLS. As F increases, the magnitude of the bias shown in Columns 4 and 8 decreases both in 
absolute terms and as a proportion of 2SLS bias. 

Panel C shows that, for cases examined here, bias in the bias-corrected estimator decreases 
as Corr(y s , 8 S ) increases, other things being equal. Bias in the plug-in estimator appears slightly 
larger in the case where Corr(y s , d s ) is larger than where it is moderate in size, though it is still very 

8 Some iterations did not produce an estimate for 8 because the restricted maximum likelihood (RMLE) 
model used to obtain shrunken estimates of y s did not converge. Therefore the actual number of successful 
iterations varies by parameter values used in the simulation, ranging from 1,821 to 2,000 out of 2,000 total 
iterations. 

9 Panel A demonstrates this pattern for an F-statistic of 10. Additional results (not reported here) 
demonstrate that while this pattern holds for a wide range of F-statistics, this "U” shape pattern is more 
pronounced when the F-statistic is small and becomes more muted as the F-statistic increases. 
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small compared to the bias in the 2SLS estimator. Note that compliance-effect bias in 2SLS or OLS 
increases with Corr(y s , S s ). So results in this panel indicate that, when C V y is 1 and the F-statistic is 
fairly large (e.g., F = 26], both bias-corrected estimators performs very well when they are needed 
the most — when the compliance-effect bias is large. 

Sampling variability of the bias-corrected IV estimators in the presence of compliance-effect 
covariance in finite samples 

Columns 5 and 9 report the true sampling variation (the standard deviation of the estimates 
across the 2000 simulation samples] of the two bias-corrected estimators, while columns 6 and 10 
report the average bootstrapped standard error of the estimates, for each scenario. In general, 
except when F is very small, the sampling variance of both bias-corrected estimators is roughly 
similar to that of the 2SLS estimates. This suggests that the bias correction does not come at any 
significant loss of precision compared to 2SLS (of course, the sampling variances of the bias- 
corrected estimators and of 2SLS are much larger than the conventional 2SLS-estimated standard 
errors, as shown in column 14]. Moreover, the bootstrapped standard errors for the bias-corrected 
estimators are very close to the true standard errors, except when F is very small. 

Comparing the bias-corrected IV estimators to the 2SLS and OLS estimators in the presence of 
compliance-effect covariance infinite samples 

Figure 2 compares the estimated bias and RMSE from the 2SLS and bias-corrected IV 
estimators under a variety of conditions. The horizontal axis in each graph indicates the first stage 
F-statistic and the vertical axis either the bias (left panel of figures] or RMSE (right panel]. We 
present separate graphs for CV y values of 1.0, 0.2, and 0. 10 In each case, Corr(y s , S s ) is fixed at 0.25. 

10 This figure shows how the four estimators behave as CV y starts to deviate from the optimal value of 1 
towards 0. Results are similar to those presented here when CV y deviates from the optimal value of 1 
towards infinity. Figure Cl in appendix C present graphical demonstrations of those results. 
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In each of the graphs on the left panel, the area below the 2SLS bias line is decomposed into 
two parts: the light grey area on top represents the amount of compliance-effect bias (CEB] in the 
2SLS estimator and the dark grey area at the bottom represents the finite sample bias component 
[FSB] of the 2SLS estimator. This decomposition is based on Equation 5a and the sum of these two 
components closely tracks the estimated 2SLS bias (the sum does not exactly track the bias as the 
decomposition is an approximation]. These three graphs illustrate that the relative bias of 2SLS 
and the bias-corrected estimators depends both on C V y and the first stage F-statistic. As expected, 
the bias corrected estimators reduce 2SLS bias the most when 2SLS compliance-effect bias is large 
relative to the 2SLS finite sample bias. 

Specifically, when CV y is 1, the bias-corrected estimators always have smaller bias than the 
2SLS estimator, regardless of the first-stage F-statistic (top graph]. This is not surprising since, for 
any given F-statistic, the bias-corrected estimators have minimum bias when CV y is 1, while 2SLS 
bias is maximized at this point. The dotted line closely tracks the FSB area (in dark grey], indicating 
that, in this case, the bias-corrected estimators are very successful in eliminating almost all of the 
compliance-effect bias in the 2SLS estimator, regardless of the F-statistic. 

When CV y is different from 1 but does not lie in the extremes (i.e., CI^=0.2], the bias- 
corrected estimators can still produce a smaller bias than the 2SLS method if the F-statistic is 
greater than 10 (middle graph]. As CV y continues to deviate from 1 and reaches the extreme of zero 
(i.e., when y s does not vary across sites], the bias in the bias-corrected estimators approaches the 
bias in the 2SLS estimator as the F-statistic increases, but the 2SLS estimator produces the smallest 
bias among the four methods for all F-statistics presented here (bottom graph]. This is not 
surprising since in this case, there is no compliance-effect bias in the 2SLS estimator (the first term 
in Equation 5a is zero], therefore there is nothing for the alternative method to correct for. 

The three graphs on the right side of Figure 2 compare the root mean squared error (RMSE] 
of these four estimators. The layout for these graphs is the same as that for the graphs on the left 
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side except that the vertical axis now represents the RMSE instead of the bias. These three graphs 
show that the RMSE for the bias-corrected estimators is generally larger than that for the 2SLS 
estimator when F is small (less than 10], but decreases faster as F increases than does the RMSE of 
the 2SLS estimator. As a result, the bias-corrected IV estimators have the smallest RMSE when F is 
above some threshold, though this threshold depends on C V y — it is the smallest when CV y is 1 (top 
graph] and becomes larger as CV y deviates from 1 (middle and bottom graph]. 

Figure 3 provides similar comparisons of the magnitude of bias and RMSE among the three 
estimators as a function of the compliance-effect correlation. In these figures, the F-statistic is set 
to a value of 26, and the horizontal axis indicates values of Corr(y s , S s ). All other attributes of the 
graph are the same as in Figure 2. 11 

Similar to Figure 2, the three graphs on the left side of figure 3 show that when C V y = 1, the 
bias-corrected estimators work well in eliminating the compliance-effect bias in the 2SLS bias, 
especially when the CEB is large (top graph]. When CV y deviates somewhat from 1, the bias- 
corrected estimators eliminate some, but not all of the compliance-effect bias (middle graph]. 

When there is no compliance-effect bias in the 2SLS estimator (either because Corr(y s , S s ) = 0 or 
C V y = 0], the bias in the 2SLS estimator is smaller than that of the bias-corrected estimator. 
Nonetheless, as the three graphs on the right side of figure 3 show that, across all cases examined in 
this figure, the RMSE of the bias-corrected estimator is always smaller or equal to that of 2SLS, even 
when there is no compliance-effect covariance bias. The RMSE of the plug-in estimator is similar in 
magnitude, although it is sometimes larger than that of 2SLS. This suggests that the bias-corrected 
IV estimator may be generally preferable to 2SLS as long as F is modestly large (recall that it is 26 
in the figures here] and CV y < 1. 

It is clear that the combination of CV y , the F-statistic, and Corr(y s , <5 S ) affect the 

11 Like in Figure 2, this figure presents situations when CV y deviates from 1 towards 0. Results are similar 
when CV Y deviates from 1 towards infinity. Results for those cases are presented in Figure C2 of Appendix C. 
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performance of the bias-corrected IV estimators relative to that of the 2SLS estimator. In general, 
when the F-statistic is greater than 10, the bias-corrected IV estimators outperform the 2SLS 
estimator both in terms of bias and RMSE under a wide range of conditions. This is especially true 
when Cl^does not deviate from 1 too much and when Corr(y s , S s ) is not very close to zero. When 
the F-statistic is less than 10, the bias-corrected estimators generally perform worse than 2SLS. 
However, because IV methods should generally not be used when the F-statistc is less than 10 (for 
example, see Yogo and Stock, 2005], this is not a particularly useful comparison. 

VI. Empirical Examples 

We now apply 2SLS and the bias-corrected IV estimators to a reanalysis of data from two 
studies: [1] the Tennessee class size experiment, Project STAR (e.g., Finn and Achilles, 1990] and 
[2] the federal Reading First Impact study described earlier. For both examples we estimate the 
relationship between a hypothesized mediator and an outcome using OLS, 2SLS and the bias- 
corrected estimator. However the examples represent two very different study designs. Project 
STAR randomly assigned a large number of individual students to treatment status in a large 
number of sites (schools], whereas the Reading First Impact Study examined student outcomes for 
a small number of schools that were assigned to treatment or control status in a small number of 
sites. The two examples also differ in terms of the factors that influence the effectiveness of our 
bias-corrected estimator: (1] the strength of their instruments, (2] their cross-site variation in 
compliance, and (3] their cross-site correlation between compliance and mediator effects. In 
addition, the RF study was a cluster-randomized trial: schools, rather than students, were randomly 
assigned to treatment conditions. We take this clustering into account in our analyses, but do not 
spend time discussing the clustering issue, as it is orthogonal to the key issues of identification and 
bias that we focus on in this paper. 
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Project STAR 

Project STAR (Student-Teacher Achievement Ratio] randomized approximately 5,900 
entering kindergarten students at 79 elementary schools to either a small class (13 - 17 students] 
or a regular-sized class (22 - 26 students] (Krueger, 1999 and Nye, Hedges and Konstantopoulos, 
2000], Students assigned to a regular-sized class were further randomly assigned to classes with or 
without a classroom aide. Because previous analyses found no difference in student outcomes for 
students in regular-sized classrooms with or without an aide (Krueger, 1999] we combine these 
two groups into a single regular-size classroom group. 

The mediator of interest for us is actual class size, which differs from assigned class size 
because some students assigned to small classes ended up in classes with 18 or more students, and 
some assigned to regular classes had fewer than 22 in their class. Note that this mediator is an 
interval-scaled, multi-valued variable rather than a binary "compliance" indicator. Therefore, this 
example is not simply a case where we are interested in adjusting the experimental estimates for 
non-compliance, but rather are interested in estimating the effect of a one-unit change in class-size. 
As we show below, actual class size (and the effect of being assigned to a small class] vary 
significantly among students, even among those assigned to the same treatment condition. We use 
79 instruments — a zero/one indicator for assignment to a small class interacted with a zero/one 
indicator for each school. We use OLS, 2SLS with 79 instruments, and the two bias-corrected IV 
estimators with 79 instruments to estimate the effect of actual class size on student math and 
reading achievement at the end of the kindergarten year for students who were randomized when 
they entered kindergarten. 

The left hand panel of table 3 summarizes the results of our reanalysis of the STAR data. We 
begin by considering the OLS and 2SLS estimates of the effects of class size. The OLS estimates in 
Table 3 indicate that, on average, reducing the size of a kindergarten class by one student increases 
math achievement by 1.04 scale-score points and increases reading achievement by 0.72 scale-score 
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points. The corresponding 2SLS estimates are 1.11 points in math and 0.71 points in reading, 
estimates that are very close to the OLS results. This similarity is likely because in Project STAR a 
very large proportion of the variance in class size was determined by random assignment, leaving 
little endogenous variation in class size to produce bias. Hence, unlike many mediators which vary 
naturally across individuals in a study sample, and thus may be correlated with their unobserved 
characteristics, Project STAR does not appear to have a substantial endogeneity problem. 

Prior to estimating the effects of class size using the bias-corrected 2SLS estimators, it is 
useful to assess the potential compliance-effect covariance bias that might be present in the 2SLS 
estimates. To do so, we examine the F-statistic and estimate C V y , zg, and Corr(y, 8) to determine 
whether, based on our simulations reported in Table 2 and Figures 2-3 above, we expect the bias- 
corrected estimators to outperform 2SLS. For both math and reading, CV y ~ 0.25 and F > 1,000; 
the large F-statistic reflects the facts that variation in class size is largely due to randomization and 
that the average sample per school is substantial. Using the methods described in Raudenbush, 
Reardon, and Nomi (2012] and in in Appendix D, we estimate zg « 3.5 for both math and reading, 
and Corr(y, 8 ) = —0.24 and —0.36 in math and reading, respectively. These values suggest that the 
bias-corrected estimators should perform extremely well. Based on Figure 2, when C V y = 0.2 and 
Corriy, d) = 0.25, both the bias-corrected estimators are substantially less biased and have smaller 
RMSE when F is 100. Given that F is even larger in the STAR example (and given that 2SLS bias 
does not decline significantly after F is above 10], we prefer the bias-corrected IV estimates for 
these STAR analyses. Based on these values, Equation (5a] implies that the compliance-effect 
covariance bias in the 2SLS estimator is roughly 0.27 in both math and reading; this is a moderate 
amount of bias relative to the 2SLS effect estimates of -1.11 and -0.71. 

The two bias-corrected IV estimates (reported at the bottom of Table 3] are larger (18 to 35 
percent larger, in fact] than their 2SLS counterparts. They imply that reducing the size of a 
kindergarten class by one student increases average student achievement by 1.32 or 1.39 scale- 
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score points for math and 0.96 or 0.97 scale-score points for reading (depending which of the two 
bias-corrected estimators we use]. Expressed as effect sizes these results imply a roughly 0.03 
standard deviation increase in test scores per student of class size reduction. Note, however, that 
the standard errors of the bias-corrected estimates are 15-20% larger than the 2SLS- and OLS- 
estimated standard errors, and that the confidence intervals for the 2SLS, OLS, and bias-corrected 
estimates overlap considerably. For Project STAR, where variation in the mediator was mainly 
induced by randomization (and thus mainly exogenous] and where there are numerous 
randomized individuals per block and numerous blocks, the four estimation approaches yield 
roughly comparable point estimates and statistical inferences. Nonetheless, although our 
conclusions about the effectiveness of reducing class sizes may not change much depending on 
which we estimator we use in this case, the values of C V y , F, and Corriy, 8) and the simulations in 
Section V suggest that the two bias-corrected estimates are to be preferred to the 2SLS or OLS 
estimates in this example. As Figure 2 shows, when CV y « 0.2 and Corriy, 8 ) ~ 0.25 and F > 100, 
the two bias-corrected estimators have very similar bias and RMSE; we have no clear way to choose 
between them in this case (nor do we need to, as they yield very similar estimates]. 

Another potential way to assess the impact of compliance-effect covariance bias is to 
examine the estimates of aq. Because these estimates for Project STAR are statistically significant 
(at least in the case of reading] they provide reliable evidence of a true departure from linearity in 
the relationship between the effect of randomization on student achievement (/?] and the effect of 
randomization on class size (/]. This departure from linearity implies the presence of compliance- 
effect covariance bias. 

To help visualize this relationship, Figure 4 presents a graph of reduced-form OLS estimates 
of /? and Empirical Bayes estimates of y for each school in the sample. Superimposed on this 
scatter-plot is the estimated quadratic relationship implied by the estimates of a 0 and a x in Table 3. 
The top graph is for reading and the bottom graph is for math. Because it is difficult to see a pattern 
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in the plotted points, consider what is implied by the fitted curve. Sites in which there was a 
greater reduction in class size as a result of treatment assignment have, on average, a larger 
increase in test scores as a result of treatment assignment, but this association does not appear to 
be linear. This nonlinearity implies a covariance between the site-average compliance levels and 
site-average effects — a unit-change in class size appears to effect test scores the most, on average, 
in the schools where random assignment induced a smaller change in class size. This might result 
from a non-linearity in the underlying relationship between class size and achievement. 

Reading First 

The Reading First Impact Study was conducted in 18 sites (comprising 17 school districts 
and one statewide program] where between 6 and 32 schools per site were assigned to treatment 
or comparison condition status. 12 Data from the study make it possible to estimate program 
impacts on RF instructional time (the mediator of interest]. In addition estimates were obtained for 
program impacts on student reading achievement measured by SAT10 reading scale scores for 
three annual student cohorts in grades one and two. The smallest block for estimating impacts is a 
single cohort in a single grade from a single site. There are 108 such blocks. Because the unit of 
assignment to Reading First is schools, the effective sample size of these blocks is quite small and 
the strength of instruments created by interacting assigned treatment status with zero/one block 
indicators is quite weak (their first stage F-statistic is 3.48], Thus our analyses are based on 36 
blocks (which pool student cohorts within grade-by-site cells] or 18 blocks (which pool student 
cohorts and grades within sites]. 

As we reported in the introduction above, an IV analysis with a single instrument indicates 

4 29 

that on average, student reading achievement increased by 0.37 scale score points (^-^] per 

12 Treatment was not assigned randomly in most of the RF sites, but was rather assigned on the basis of an 
observed rating score. Our analysis here, like the impact analysis reported by Gamse et al (2008], is based on 
a regression discontinuity design, but that feature of the analysis is not essential to our exposition and so is 
excluded for simplicity. 
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additional minute of RF instruction. The right side of table 3 reports corresponding results obtained 
from OLS, 2SLS with multiple instruments and the two bias-corrected IV estimators with multiple 
instruments. The OLS estimates indicate a very small mediator effect: an additional 0.037 or 0.122 
scale score points per minute of RF instruction per daily reading block (which correspond to effect 
sizes of 0.001 and 0.003 per minute of instruction for 18 blocks or 36 blocks, respectively]. The 
2SLS estimates of this mediator effect are much larger: 0.397 or 0.387 for 18 or 36 blocks (effect 
sizes of roughly 0.01], respectively, estimates that are very close to the single-instrument estimate 
of 0.37 points per minute of RF instruction. 

The corresponding bias-corrected IV estimates are 0.365 for 18 blocks and 0.484 for 36 
blocks. Hence, they are roughly comparable to estimates produced by 2SLS. This is especially true 
for the finding based on 18 blocks where the first-stage F-statistic for 2SLS (17.7] suggests that one 
can have some confidence in the bias-corrected estimators. This suggests that the Reading First 
example might not involve substantial compliance covariance bias. To explore this issue it would 
be useful to examine the quadratic coefficient in the regressions used to produce bias-corrected 
estimates. However, as can be seen from Table 3, this coefficient is not estimated precisely enough 
to provide information that is useful for this purpose. 

Several further points about these findings are important to consider. N ote first that 
estimated standard errors are not presented for the OLS, 2SLS or bias-corrected estimators. This is 
because the small number of schools in each block (the smallest blocks have only 6 schools] do not 
support valid bootstrapped standard errors (Freedman, 2005], Thus for this example, it is not 
possible to use bootstrapped standard errors to provide statistical inferences for any of the 
estimators. 13 This problem is likely to arise frequently when aggregate units (clusters] are assigned 
to treatment or control status, which typically results in small numbers of aggregate units per block. 

13 For the 2SLS and OLS estimators, it is possible to obtain estimated standard errors through conventional 
methods based on standard software packages. However, as demonstrated earlier in the paper, those 
standard errors tend to understate the sampling variation, especially when first stage F is small. Therefore 
conventional standard errors for the OLS and 2SLS estimators are not reported in Table 3 either. 
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Note second that estimates of mediator effects produced by 2SLS and the bias corrected estimator 
are many times larger than those produced by OLS. This probably reflects attenuation bias in the 
OLS estimates that is created by a lack of reliability in the observational measure of RF instructional 
time (each classroom was only observed by a single rater during a single 60-90 minute reading 
block]. Neither 2SLS and nor the bias -corrected estimators are subject to this problem. 

In summary, Project STAR illustrates a situation in which the bias-corrected estimators are 
likely to work quite well: the F-statistic is unusually large (over 1,000], the coefficient of variation 
for compliance equals about 0.26, and the number of observations per block (over 70] is large 
enough to support accurate bootstrapped standard errors. Reading First provides a much more 
limited application. The F-statistic is 17.7 or 8.2, the coefficient of variation for compliance is 0.76 
or 0.79 and the number of observations per block (ranging from 6 to 32] is too small to support 
bootstrapped standard errors. 

VII. Discussion and Conclusion 

The use of multiple site-by-treatment status instruments to identify the effects of the 
mediators of a treatment in a multi-site trial is a potentially promising method, though it does not 
come without some complexity. In addition to the usual set of assumptions required for 
identification in instrumental variables models, an additional assumption — that there is no 
correlation between the site-average compliance rates and the site-average effects of the 
mediator — is required (Reardon and Raudenbush, forthcoming]. This assumption is required 
regardless of whether the goal is to identify a complier average causal effect (a LATE, in Angrist, 
Imbens, and Rubin's 1996 terminology] or an average effect in a population (ATE], Note that in 
2SLS estimation (and in other parametric IV methods], the assumption of compliance-effect 
independence implies that the relationship between the site-specific intent-to-treat effects (the /? s 's 
in our notation] and the site-specific average compliances (the y s 's] is linear. If the compliance- 
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effect independence assumption is not met, then the implicit linearity assumption in multiple 
instrument 2SLS model will lead to biased estimates. However, this bias is not unique to 2SLS 
MSMM-IV estimation: it is present as well in some other standard methods of IV analysis, such as 
the limited information maximum likelihood IV estimator (LIML], Moreover, Raudenbush, 

Reardon, and Nomi (2012] note that even a non-parametric method such as averaging estimates 
from multiple sites (which might themselves be estimated using any one of a number of IV 
estimators] using precision weights will suffer from the same between-site compliance-effect 
covariance bias that we describe here. 

Reardon and Raudenbush (forthcoming, Appendix C] derive an asymptotic expression for 
the 2SLS bias due to compliance-effect covariance, but do not consider how compliance-effect 
covariance bias may interact with finite sample bias. Here we have shown that the magnitude of 
the compliance effect covariance bias depends on the strength of the instruments. We have derived 
an analytic expression approximating the magnitude of both finite sample bias and compliance- 
effect covariance bias. This expression shows that, ceteris parabis, the magnitude of compliance- 
effect covariance bias increases asymptotically as the instruments grow stronger, while finite 
sample bias decreases. Thus, a strong set of instruments is no guarantee against compliance-effect 
covariance bias. Our simulations illustrate that the bias formula closely matches the true bias over 
a wide range of the parameter space, and demonstrates that the bias due to compliance-effect 
covariance may be substantial. 

To address this problem, we develop two closely-related alternative instrumental variables 
estimators — the bias-corrected IV estimator and the plug-in bias-corrected IV estimator. Our 
simulations show that these two estimators perform very well over a wide range of conditions 
when the first stage F-statistic is greater than 10. In this situation, as long as CV y is not too extreme 
and Corr(y s , S s ) is not very close to zero, the bias-corrected estimators generally outperform the 
2SLS estimator both in terms of bias and RMSE. Note that both the coefficient of variation for 
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compliance and the first stage F-statistic can easily be estimated based on the data, so researchers 
can readily assess whether it is preferable to use the bias-corrected estimators. 

The two bias-corrected estimators rely on a weaker assumption than the 2SLS estimator. 
While 2SLS requires the assumption that the site-average compliances and the site-average effects 
of the mediator are independent, the bias-corrected estimators require only that that the 
association between the site-average compliances and the site average effects be linear. This is a 
significantly more plausible assumption than the assumption of no association. The bias-corrected 
estimators are therefore preferable to 2SLS in a wide range of situations for the analysis of 
mediator effects in multi-site trials. 

Several general caveats are important to note here. First, because IV models rely heavily on 
the exclusion restriction for identification of mediator effects, IV analysis is suitable for mediation 
analysis only when the exclusion restriction is valid — that is, only when the effect of the instrument 
on the outcome is fully mediated by the specified mediator or mediators. Partial mediation models, 
in which there may be a direct effect of the instrument as well as mediated effects, rely on 
fundamentally different assumptions and different analytic strategies than those we have described 
here. IV models for mediation require that we specify and measure all mechanisms through which 
an instrument affects an outcome. 

Second, our focus in this paper has been on reducing the 2SLS bias caused by between-site 
compliance-effect covariance. If, however, the mediator is not binary, and the researcher wishes to 
estimate an average effect of the mediator in the population, there may be additional bias caused by 
within-site compliance-effect covariance (Reardon and Raudenbush forthcoming]. Such potential 
bias is a feature of all 2SLS estimators, whether they rely on a single instrument or multiple 
instruments. In principle, this bias may be larger or smaller than the bias due to between-site 
compliance-effect covariance, depending on the magnitudes of the covariances and the strength of 
the instruments. Methods of detecting and correcting such within-site compliance-effect 
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covariance bias have been suggested elsewhere (Heckman and Vytlacil 1999; Reardon and 
Raudenbush forthcoming]; we do not discuss them here. 
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Bias Bias Bias 


Figure 2: Bias and RMSE of Four Estimators by F-statistic and CV y , when Corr (y s , 5s ) = 0.25 


CV v = l 



CV v = 0.2 



CV v = 0 



Bias-Correction Estimator - ■ 2SLS Estimator A OLS Estimator • Plug-in Bias-Corrected Estimator 


39 


Bias Bias Bias 


Figure 3: Bias and RMSE of Four Estimators by Corr (y s , 5s ) and CV Y , when F-statistic=26 
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Figure 4. Relationship between Reduced-form OLS Estimates of /? 5 and Empirical Bayes 
Estimates of y s for Each School in the Tennessee STAR Sample, for Kindergarten Reading and 
Math Test Scores 


4a. Reading 
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Table 1. Estimated Bias and Root Mean Squared Error of Multiple-Site, Multiple-Instrument 2SLS Estimator 


Data Generating Parameters 2SLS Estimator OLS Estimator 


Case 

cv y 

F 

Corr:r,,oV sd (6] 

Predicted Estimated 

Bias Bias 

True 

se(s) 

Average 

RMSE 

Predicted Estimated 

Bias Bias 

True 

se(fi) 

Average 

RMSE 


(1) 

(2] 

(3] 

(4] 

(5] 

(6] 

(7] 

(8] 

(9] 

(10] 

(11] 

(12] 

(13] 

(14] 

Panel A: CV y 

varies 














1 

0 

10 

0.25 

1 

0.050 

0.051 

0.173 

0.064 

0.180 

0.479 

0.483 

0.142 

0.013 

0.503 

2 

0.2 

10 

0.25 

1 

0.137 

0.137 

0.173 

0.061 

0.221 

0.482 

0.487 

0.139 

0.013 

0.506 

3 

1 

10 

0.25 

1 

0.275 

0.283 

0.223 

0.061 

0.361 

0.489 

0.494 

0.139 

0.013 

0.513 

4 

5 

10 

0.25 

1 

0.137 

0.151 

0.256 

0.062 

0.297 

0.482 

0.488 

0.139 

0.013 

0.507 

5 

OO 

10 

0.25 

1 

0.050 

0.074 

0.259 

0.064 

0.270 

0.479 

0.485 

0.140 

0.013 

0.504 


Panel B: Expected F-statistic varies 


6 

1 

2 

0.25 

1 

0.375 

0.387 

0.243 

0.135 

0.457 

0.499 

0.504 

0.139 

0.013 

0.523 

7 

1 

5 

0.25 

1 

0.300 

0.309 

0.229 

0.086 

0.385 

0.495 

0.500 

0.139 

0.013 

0.519 

8 

1 

10 

0.25 

1 

0.275 

0.283 

0.223 

0.061 

0.361 

0.489 

0.494 

0.139 

0.013 

0.513 

9 

1 

26 

0.25 

1 

0.260 

0.267 

0.220 

0.039 

0.346 

0.472 

0.478 

0.139 

0.013 

0.497 

10 

1 

101 

0.25 

1 

0.252 

0.259 

0.218 

0.021 

0.339 

0.416 

0.423 

0.146 

0.013 

0.447 


Panel C: Corr(y s ,S s ) varies 


11 

1 

10 

-0.75 

1 

-0.625 

-0.603 

0.240 

0.078 

0.649 

0.445 

0.446 

0.145 

0.013 

0.469 

12 

1 

10 

-0.25 

1 

-0.175 

-0.157 

0.225 

0.067 

0.275 

0.467 

0.471 

0.139 

0.013 

0.491 

13 

1 

10 

0 

1 

0.050 

0.063 

0.223 

0.063 

0.232 

0.478 

0.483 

0.138 

0.013 

0.502 

14 

1 

10 

0.25 

1 

0.275 

0.283 

0.223 

0.061 

0.361 

0.489 

0.494 

0.139 

0.013 

0.513 

15 

1 

10 

0.75 

1 

0.725 

0.720 

0.234 

0.061 

0.757 

0.511 

0.517 

0.142 

0.013 

0.536 

Panel D: sd (8) 

varies 














16 

i 

10 

0.25 

0 

0.050 

0.051 

0.045 

0.044 

0.068 

0.478 

0.479 

0.009 

0.010 

0.479 

17 

i 

10 

0.25 

0.2 

0.095 

0.096 

0.061 

0.044 

0.114 

0.480 

0.482 

0.028 

0.012 

0.483 

18 

i 

10 

0.25 

1 

0.275 

0.283 

0.223 

0.061 

0.361 

0.489 

0.494 

0.139 

0.013 

0.513 

19 

i 

10 

0.25 

5 

1.175 

1.221 

1.101 

0.234 

1.644 

0.533 

0.558 

0.696 

0.013 

0.892 


Note: Details of simulation in Appendix B. In each row, the following parameters are used: Each simulation data sets has 50 sites, with 200 observations within site, 50% 
of which are assigned to the treatment condition. The variances of the first and second stage error terms are set to 1, and their correlation is set to 0.5. In column (5), the 
predicted bias is computed from Equation (5a]; in column (10], the predicted bias is computed from Equation (4a]. The RMSE in column (9] is computed as the square 
root of the sum of the squares of columns (6] and (7]. The RMSE in column (14] is computed as the square root of the sum of the squares of columns (11] and (12]. 
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Table 2. Estimated Bias and RMSE of Bias-Corrected IV Estimator and Multiple-Site, Multiple-Instrument 2SLS IV Estimator 


Data Generating Parameters 

Bias-Corrected IV Estimator 

Plug-in Bias-Corrected IV Estimator 


2SLS Estimator 


Case CVy 

F 

CorriY s .S ,) 

Estimated True 
Bias se(s) 

Average 

se(S) 

RMSE 

Estimated True Average 

Bias se(fi) "(«) RMSE 

Estimated 

Bias 

T rue Average 

RMSE 



(2) 

[3] 

[£) [§]_ 

[6] 



[8] [9] [10] [11] 

Q2] 

_Q3] (14) 

(15) 


Panel A: CV Y varies 


1 

0 

10 

0.25 

-0.178 

0.144 

0.160 

0.228 

-0.258 

0.133 

0.160 

0.290 

0.051 

0.173 

0.064 

0.180 

2 

0.2 

10 

0.25 

-0.182 

0.152 

0.170 

0.237 

-0.265 

0.142 

0.150 

0.301 

0.137 

0.173 

0.061 

0.221 

3 

1 

10 

0.25 

0.083 

0.278 

0.270 

0.290 

-0.007 

0.245 

0.190 

0.245 

0.283 

0.223 

0.061 

0.361 

4 

5 

10 

0.25 

0.189 

0.248 

0.220 

0.312 

0.072 

0.238 

0.210 

0.249 

0.151 

0.256 

0.062 

0.297 

5 

00 

10 

0.25 

0.193 

0.251 

0.230 

0.317 

0.073 

0.241 

0.220 

0.252 

0.074 

0.259 

0.064 

0.270 

1 

0 

26 

0.25 

-0.062 

0.140 

0.148 

0.153 

-0.098 

0.135 

0.140 

0.167 

0.020 

0.157 

0.040 

0.158 

2 

0.2 

26 

0.25 

-0.062 

0.142 

0.152 

0.155 

-0.100 

0.138 

0.150 

0.171 

0.116 

0.159 

0.039 

0.196 

3 

1 

26 

0.25 

0.039 

0.230 

0.223 

0.233 

0.002 

0.217 

0.200 

0.217 

0.267 

0.220 

0.039 

0.346 

4 

5 

26 

0.25 

0.080 

0.227 

0.213 

0.241 

0.040 

0.236 

0.220 

0.239 

0.129 

0.257 

0.040 

0.287 

5 

00 

26 

0.25 

0.082 

0.230 

0.215 

0.244 

0.041 

0.240 

0.220 

0.244 

0.043 

0.260 

0.041 

0.264 


Panel B: Expected E-statistic varies 


6 

1 

2 

0.25 

-0.349 

2.664 

0.774 

2.687 

-0.346 

1.052 

0.270 

1.107 

0.387 

0.243 

0.135 

0.457 

7 

1 

5 

0.25 

0.114 

0.435 

0.359 

0.450 

-0.044 

0.337 

0.270 

0.340 

0.309 

0.229 

0.086 

0.385 

8 

1 

10 

0.25 

0.083 

0.278 

0.270 

0.290 

-0.007 

0.245 

0.190 

0.245 

0.283 

0.223 

0.061 

0.361 

9 

1 

26 

0.25 

0.039 

0.230 

0.223 

0.233 

0.002 

0.217 

0.200 

0.217 

0.267 

0.220 

0.039 

0.346 

10 

1 

101 

0.25 

0.014 

0.215 

0.210 

0.215 

0.003 

0.210 

0.200 

0.210 

0.259 

0.218 

0.021 

0.339 


Panel C: Corr(Y s ,S s ) varies 


11 

1 

26 

0.00 

0.058 

0.236 

0.229 

0.243 

0.021 

0.221 

0.210 

0.222 

0.030 

0.223 

0.040 

0.232 

12 

1 

26 

0.25 

0.039 

0.230 

0.223 

0.233 

0.002 

0.217 

0.200 

0.217 

0.270 

0.223 

0.040 

0.361 

13 

1 

26 

0.75 

-0.001 

0.189 

0.189 

0.189 

-0.039 

0.209 

0.190 

0.213 

0.730 

0.234 

0.040 

0.757 


Note: Details of simulation in Appendix B. In each row, 6=1 and sd(8)=l. All additional parameters are set as described in Table 1. Columns (5) and (9) report the standard deviation of the 
distribution of estimates of 6 over 2000 samples. Column (6) reports the average bootstrapped standard error (see text for description of bootstrapping procedure) over 100 samples 
(bootstrapped standard errors were computed for only 100 iterations due to computational time). The RMSE in columns (7) and (11) are computed as described in Table 1. 
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Table 3. Estimated Mediator Effects Using Empirical Data 


Project STAR Reading First 


Math Reading 18 Blocks 36 Blocks 


OLS Estimator 

6 

-1.039 ** 

-0.718 ** 

0.037 

0.122 

Bootstrapped s.e.(5) 

(0.340) 

(0.230) 

(n.a.) 

(n.a.) 

2SLS Estimator 

5 

-1.114 ** 

-0.714 ** 

0.397 

0.387 

Bootstrapped s.e.(5) 

(0.350) 

(0.230) 

(n.a.) 

(n.a.) 

Observable/Estimable Parameters 

F-statistic 1082.1 

1071.5 

17.7 

8.2 

t y 

3.45 

3.47 

63.25 

68.30 

Y 

-7.25 

-7.26 

10.47 

10.45 

cv y 

0.26 

0.26 

0.76 

0.79 

Estimated t 6 

5.814 

2.216 

0.531 

0.363 

Estimated Corr(y s , 5 S ) 

-0.240 

-0.357 

0.216 

-0.009 

Estimated 2SLS Compliance- 

Effect Covariance Bias 

0.279 

0.256 

0.143 

-0.005 

Estimates from Quadratic Regression 

a 0 -3.583 * 

-3.025 ** 

0.157 

0.491 

s.e.(a 0) 

(1.546) 

(0.959) 

(1.025) 

(0.783) 

«i 

-0.312 + 

-0.285 * 

0.020 

-0.001 

s.e.fcti) 

(0.187) 

(0.116) 

(0.060) 

(0.045) 

Bias-Corrected Estimator 

5 

-1.319 ** 

-0.957 *** 

0.365 

0.484 

Bootstrapped s.e.(5) 

(0.420) 

(0.260) 

(n.a.) 

(n.a.) 

Plug-in Bias-Corrected Estimator 

5 -1.392 ** 

-0.969 *** 

0.254 

0.127 

Bootstrapped s.e.(5) 

(0.418) 

(0.269) 

(n.a.) 

(n.a.) 

N (sites/blocks) 

79 

79 

18 

36 

N(observations) 

5,871 

5,789 

248 

248 


Note: + p<.10; * p<.05; ** p<.01; *** p<.001. Estimated compliance effect covariance bias 
computed from Equation (5a). Boostrapped standard errors computed as described in text. 
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Appendix A 


Within a given site s, let the data generating model be 

Mi = A s + Y s Ti + e t , ei~N{ 0,er 2 ) 

Y i = Q s + S s M i + u i , Ui~N( 0,a) 2 ) 

/0\ / o 2 paa)\ 
m)/ ’ \paa) a) 2 ) 

where p is the within-site correlation of and iq. Across sites, the covariance matrix of the y s 's and 
the S s ’s is 

(Ts\ 17 ^ ( Ty Ty5N ll 

KaMv t S Jr 

( 1 ) 



Al: Derivation of the population F-statistic (Equation 3) 14 

Suppose W is distributed as a non-central chi-square with df = v t and non-centrality 
parameter A; and U is distributed as a central chi-square with df = v 2 independently of W, then 

F = TJV distributed as F(v 1 , v 2 , A ), a non-central F with numerator degrees of freedom v lt 


denominator degrees of freedom v 2 and non-centrality parameter A. This variable has mean 
(Johnson & Kotz, 1994] 


E[F] = 


(v 2 - 2) V Vi 


When v 2 is large, ^ 2 _^ « 1, so we have 


E[F] « 1 + — . 

Vl 


[Al.l] 


(A1.2) 


14 We thank Steve Raudenbush for providing this derivation. 
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Now consider the data generating model given above. Define z s = 


Ys 




=, the ratio of 


seifs) a/,Jnp(,l-p)’ 

the sample mean difference between experimental and control groups in site s and its standard 
error. Then z s is distributed as a non-central Z with non-centrality parameter A s = E[z s ] = 

-jj = =. It follows that W = Yis = 1 z s is distributed as a non-central chi-square with degrees of 

freedom v x = K and non- centrality parameter 

A = E 


K i 

Ya 2 

— F 

yK 12 1 

2 js =1 a s 

Knp( 1 — p)(y 2 + Ty) 

-S=l 

Lj 

a 2 /np{A — p) 

a 2 


(A1.3) 


Now define 


K n 

u =Yj Z&* /ff2 - 

s=l i= 1 


(A1.4] 


U is distributed as a central chi-square with df = v 2 = K(n — 2). Now note that F = is the F- 


statistic for the test of the null hypothesis that the instrument has no effect in every site, 

H 0 -Ys = 0, Vs, or, alternately, H 0 : X?=i Ys = 0 - So long as v 2 = K(n — 2) is large, Equation (A1.2] 
yields Equation (3): 


E[F] 


1] A _ 1 , np(l-p)(y 2 +T y ) 
a 2 


( 3 ) 


A2: Derivation of OLS bias (Equation 4a) 

Let X+ = X t — X s denote the within-site centered value of a variable X. Then centering both 
sides of Equation (2b) and substituting in the centered version of (2a) yields 
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y.+ = Ss Mt + ut 

= 8 M+ + [(5 S - 5)M+ + u+] 

= SMt + [(S s - S)(y s T t + + e?) + u ( + ] 


= SMt + [(ff s - 5) (77+ + (y s ~ YK + + e+) + u t + ]- 


(A2.1) 


Estimating 5 via OLS yields 


E[5 OL5 ] = 


E[Cov(Yf,Mj)] 

Var(Mt) 

E[Cov(8Mt + m ~ S)( Y Tt + Os ~ Y)Tf + e f + ) + ut],Mt)\ 

Var(Mt) 

E[Cov([(8 s - SXyTf + (y s - y)7) + + e t + ) + u t + ], OV + Os - y)7) + + e+))] 


= 5 + 


= 5 + 


= 5 + 


= S + 


Var(yT+ + (y s ~ Y)T+ + e+) 

£’[27r y5 l/ar(7j + ) + Cov(u+,e ( + )] 
y 2 I/ar(7) + ) + T y Var{jt ) + Ear(ej + ) 

2p(l — p)yr y5 + pater 
p(l-p)(y 2 +r y ) + cr 2 


, np(l — p) 


er^ 


K T y5 + 


npat 

a 


np(Y — p) 


er 


(y 2 + r y ) + n 


, np(l — p) 


= 5 + 


er^ 


+ 


npat 

er 


F + n — 1 
(o r n \ 

= 8 + p — ( + 

H u vF + n — 1 / 


at / n \ 2yr yS ( F - 1 
a ^F + n — IV y 2 + T y 


.( F ~ 1 ) 
\F + n - 1/ 


(A2.2) 


A3: Derivation of 2SLS bias (Equation 5a) 

Combining Equations (2a) and (2b) yields the reduced form Equation 
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Y i ~ 0 s + 5 s A s + S s Y s Ti + 8 s ei + u t 

= A s + flJi + e it e t ~N{ 0, 8 2 a 2 + a) 2 + 28 s paa)) 


(A3.1) 

We begin by fitting Equation (2a) via OLS. This yields estimates of the average compliance in each 
site s: 


7s = Ys + v s , v s ~N 0, 


np(l — p) 


(A3. 2) 


We also can estimate, within each site, the average ITT effect /? s . Here we have: 

(8 2 a 2 + a) 2 + 28 s paa))\ 


Ps = Ps + Vs. V s~N 0, 


np( 1 — p) 


- 


In finite samples: 

75=75+ 9 5 

Ps = Ps + ^S’ 


(A3. 3) 


(A3. 4) 


where g s = (e^ — e°) and b s = 8 s (ej — e°) + (uj — u°), and where e| and are the average 
values of the error terms e t and u t among those with T = t in the site s sample. Now, 

Cov{g s , b s ) = Cov(ej - e s °, 8 s (e$ - e°)) + Coufe 1 - e°,ul - u°) 

= 8Var(ej — e°) + Cov^e* — e°,Us — u°) 

= 8 (Varies) + Har(e°)) + Cov(e^ ,ul) + Cou(e° ,u°) 

= 5 

8a 2 + paa) 
np(l — p) ' 


a 2 u 2 \ /po^J _pmu_\ 
pn (1 — p)nj V pn (1 — p)n) 
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(A3. 5] 


Under the assumption of no within-site compliance-effect covariance, /? s = y s S s . Thus 

Cov(fsJ s ) = Cou(y s ,/? s ) + Cov(g s ,b s ) 


= Cov(y s ,y s 8 s ) + 


Sa 2 + paa) 
np(l — p) 


= z Y S + yCov(y s , <5 S ) + 


da 2 + paco 
np{ 1 — p) ' 


(A3. 6] 

Note that 2SLS with site-by-treatment interactions is equivalent to fitting the regression model 

Ps = 8?s + v s 


via WLS, weighting each point by W s = n s p s ( 1 — p s ). This yields 

fi(2SLS) _ 2s=l^sPs(l — Vs)YsPs 
Sf=l«sPs(l -PsWs 


(A3. 7] 


(A3. 8] 


Under the assumption that that n s = n and p s = p for all s, we have 

f (2 SLS) T, s =l?sPs 


(A3. 9] 

Now the expected value of the 2SLS estimator will be approximately equal to the ratio of the 
expected values of the numerator and denominator: 

£|£(2SLS)j _ E\2/s=i?sPs\ 

k(y + 

= YP + E\Cov{y s ,ps)\ 

{ Y + T y + np(l — p)) 
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y(y8 + t y s) + t y 8 + yTy S + 


So 2 + paa) 
P) 


( 


y 2 + Ty + 


a 


np(l 


Wr) 


= 8 + 


= 8 + 


n pou) 

ZyT ^ np( 1 - p) 


y 2 + r y + 


cr 


y np(l 

2yr rS 




paco 


y 2 + Ty + 


np( 1 — p ) 


j np(l - p)(y 2 + T y ) + cr 2 




A4: Proof that CEC bias is maximized when CV„ = 1 


(A3. 10] 


cw 


Equation [6] shows that compliance-effect covariance bias depends linearly on — . Let 


1 

X / 1 \ — X 

/(x ) = — q-j. Then note that / (^-J = ± x - = — qq = /(x). We consider only the case where x > 0, 


because the sign of the CV is arbitrary. A plot of /(x) is shown below, indicating that /(x) is 
maximized when x = 1. Note that for values of x between 0.5 and 2, the bias is at least 80% of its 
maximum; for values less than 0.25 or greater than 4, the relative bias is less than half its maximum 
possible. 
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Figure A4.1 


Relative Magnitude of 2SLS Compliance-Effect Covariance Bias, 
by Coefficient of Variation of Site-Level Compliance 



Coefficient of Variation 


A5: Derivation of Eq 15: 

Equation (14} indicates that /? s can be written as a quadratic function of y s plus a heteroskedastic 
error term: 


Ps = a oYs + «i Ys + Ys v s 


Adding the sampling error in /? s to both sides of the equation yields 

Ps = a oYs + a-L Ys + Ys v s + b s . 


Taking the expectation, given the estimated f s 's, yields 

E[Ps\?s] = E[a 0 Y s \f s ] + Eia^lfs] + E[T s u s |y s ] + E[b s \f s ]. 


(A5.1) 


(A5.2] 
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(A5.3) 


Now define y s * = f^KsIfs] = A.% + (1 — A)y, where A = t y /(j y + T g ) is the reliability of the f s 's. In 
addition, define y s 2 * = E[y^ |f s ] = y s * 2 + r y (l — A). Then, noting that v s A y s , we have 

e[Ps\ys] = “o y s * + yf + E[b s \?sl 


Now, note that 


E[b s \fs] = E i b s\?s = ¥] + C °yar[fcf & ~ Y) 

Ty+Tg 


= Cov(b s ,g s ) 


Kfs - r) 


(A5.4) 


= Cov(b s ,g s ) 


Ay s + (1 -A)y-y 


Cov(b s ,g s ) Cov(b s ,g s ) 

Ys Y 


Substituting this into (A5.4] and rearranging, we have 


E [0s\?s] = - 


yCov(b s ,g s ) 


+ a n + 


Cov(b s ,g s )\ t 


Ys + a t yf 


This indicates that if we fit the model 

0S = c + a oYs + a-yYs* + Vs. 


we will obtain 


E[c] = - 


yCov(b s , g s ~) 


(A5.5] 


(A5.6] 


(A5.7) 
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E[a 0 \ = a 0 + 


Cov(b s ,g s ) 


E[dy\ = a 1 


(A5.8) 


Recall that, under our assumptions, 8 = a 0 + ayy. This suggests the following estimator for 8: 

c 

8 = a 0 + — + ayy. 


If y is reasonably precisely estimated, then 


E[8] = E[d 0 ] + E 


L Y 


+ £[%y] 


E[c] 

m ° ] + E\n + E[a i ]E[f] 


Cov(b s ,g s ) yCov(b s ,g s ) 
a 0 H b a x y 

Ty Ty]/ 


a 0 + ayy. 


However, [f Cov ^ bs,9s ^ j s small, then fitting A5. 7 will yield 

Ty 

E[c] « 0 

£[&o] ~ a 0 

E[dy] = ay, 


which suggests that we can fit instead the model 

Ps = Clo Ys + %Ts* + V S, 


and instead estimate 8 as: 


8 — Uq H- ayy. 


(A5.9) 


(A5.10] 


(A5.ll] 


(A5.12] 
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(A5.13) 


Although the latter model will yield biased estimates, it will be more efficient, and this efficiency 
gain may outweigh the bias (that is, the estimator in A5.13 may have smaller mean squared error 
than that in A5.9], 

Note that Equation A3. 5 provides an expression for Cov(b s , g s )\ 

So 2 + paa) 


Cov(b s> g s ) = 


np(l — p) 


y z +t. 


?(*+0 


(A5.14] 


Thus, the expected value of the intercept in the regression model will be 


E[c] = 


-r(r 2 + T r) 

(f ~ l)Ty 



-y(l + CV 2 ) 
(F - 1)CV 2 



(A5.15) 


Note that the intercept will be large, in general, when y is large but r y is small (i.e., when CV y is 
small}. However, in these cases, the sampling variance of both c and a 0 will be very large, as the 
regression model in A5.8 will have little variance in the y s *'s (other than sampling variance, which 
will be non-informative] and estimation of c and a 0 will rely on substantial extrapolation. In 
contrast, when y is small and r y is large (i.e., when CV y is large], the intercept will be close to zero, 
in which case, fitting model A5.12 may be sufficient to provide an approximately unbiased estimate 
of 5. 


Because the estimator in A5.9 will be very imprecise in the cases when it is most needed 
(when CV y is small and F is small], we choose to use the estimator in A5.13 instead, as it has much 
less sampling variance than the former. In simulations not shown we confirmed that the A5.9 
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estimator has a larger root mean squared error (often extremely large] than that in A5.13. Based 
on this, we report results in the paper based on the no-intercept estimator. 


Appendix B. Simulation Set-up 

The data used in the simulations presented in this paper are generated through a two-step 
process. In the first step, we generate a set of 50 sites, each characterized by the vector 
[y s , S s , A s , 0 S , n s , p s ]', drawn from a population where 

Ty r yS o 0 0 oy 

t yS t s 0 0 0 0 

0 0 1 0 0 0 

0 0 0 1 0 0 

0 0 0 0 0 0 

0 0 0 0 0 0 /- 

(B.l) 

We fix n s = n = 200 and p s = p = 0.5 for all simulations here for simplicity, and set the 
covariances of the site fixed effects in the first and second stage equations (A s and 0 S in our 
notation] with every other parameter to be zero. The means of A s and 0 S are arbitrarily set to 0 
and their variances are arbitrarily set to 1, but these means and variances have no impact on the 
bias or precision of any of the estimators discussed here. By manipulating y, x 2 , t|, and x y s, we can 



set CVy, F, Corr(y s , <5 S ), and 



, to the values used in Tables 1 and 2. Specifically, we set 


_( a 2 (F — l)y _ ! 0.02 ■ (F - i)y 

Y ~{np(l-p)' 1 + CVy) -{ 1 + CVy ) 



i 

tyS = (jy ■ Ts) 2 ‘ Corr{y s , S s ). 


(B.2) 
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These values ensure that the simulations correspond to the scenarios described in Tables 1 and 2. 15 

In the second step, we generate 200 observations within each site, each characterized by 
the vector [T, A, e uj'. The sample in a site s is drawn from a population where 

/ Ys\ /0 0 0 0 

f, 0 o 0 0 

l 0 I'l 0 0 a 2 pad 

'0' \0 0 pad a) 2 

(B.3) 

For simplicity, we fix a 2 = Var(e{) = go 2 = Var{u{) = 1 and p = 0.5 in all simulations. We also set 
Var s (Y ) = I/ar s (A) = 0 in all sites. Note that this simulation design constrains compliance and 
effect to vary (and covary] only across sites; there is no variance among individuals within a site. 

We then randomly assign 100 observations within each site to receive 7] = 1, and the other 
100 to receive 7] = 0. We then compute, for each observation, values of the mediator and the 
outcome: 

Mi S = As + FTi + e t 
— 0 S + AM is + Uj s . 

(B.4) 

For each simulation scenario, we repeat this process 2000 times to generate the estimates shown in 
Tables 1 and 2. 

Appendix C: Additional Comparisons Among OLS, 2SLS, and the Bias-Corrected Estimator 

Figures 2 and 3 present simulation results for the OLS, 2SLS, and bias-corrected estimators 
as CVy deviates from 1 towards 0. Figures Cl and C2 show the same results for the three estimators 
of interest as C V y deviates from 1 towards infinity. Patterns observed in these cases closely mirror 
those in Figures 2 and 3. 

15 The 0.02 term in Equation (B.2] comes from the fact that we set a 2 = 1 below. 
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Bias Bias Bias 


Figure C-l: Bias and RMSE of Four Estimators by F-statistic and CV T , when Corr (y s , 8s ) = 0.25 


CV v = l 



CVy = 5 



CV v = infinity 



Bias-Correction Estimator - ■ 2SLS Estimator ii OLS Estimator ■ Plug-in Bias-Corrected Estimator 
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Bias Bias Bias 


Figure C-2: Bias and RMSE of Three Estimators by Corr (ys,$s ) and CV y , when F-statistic=26 
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Appendix D: Estimating the correlation between y s and S s 


Equation (7] implies that 


Cou(y s ,5 s ) 



Cov(y s , <5 S ) = a t T y 


Corr(y s , 8s) = a i 



This implies that we can estimate Corr(y s , 5 S ) if we can estimate a lt r y , and Tg reasonably precisely. 
We obtain a 1 from fitting Equation (13), and we obtain T y from the random-coefficients first-stage 
model (Equation 10). Estimating t 5 is not as straightforward. We estimate reusing the methods 
described in Raudenbush, Reardon, and Nomi (2012). The resulting estimates of Corr(y s , S s ) are 
shown in Table 3. 
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